Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning
نویسندگان
چکیده
As mobile devices and Web applications become popular, lightweight, client-side language analysis is more important than ever. We propose Rakuten MA, a Chinese/Japanese morphological analyzer written in JavaScript. It employs an online learning algorithm SCW, which enables client-side model update and domain adaptation. We have achieved a compact model size (5MB) while maintaining the state-of-the-art performance, via techniques such as feature hashing, FOBOS, and feature quantization.
منابع مشابه
A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives
We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab [Matsumoto et al., 1999] to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state com...
متن کاملHMM Parameter Learning for Japanese Morphological Analyzer
This paper presents a method to apply Hidden Markov Model (HMM) to parameter learning for Japanese morphological analyzer. We especially emphasize how the following two information sources affect the results of the parameter learning: 1) The initial value of parameters, i.e., the initial probabilities and 2) some grammatical constraints that hold in Japanese sentences independently of any domai...
متن کاملCas-analyzer: an online tool for assessing genome editing results using NGS data
Genome editing with programmable nucleases has been widely adopted in research and medicine. Next generation sequencing (NGS) platforms are now widely used for measuring the frequencies of mutations induced by CRISPR-Cas9 and other programmable nucleases. Here, we present an online tool, Cas-Analyzer, a JavaScript-based implementation for NGS data analysis. Because Cas-Analyzer is completely us...
متن کاملSmoothing a Lexicon-based POS Tagger for Arabic and Hebrew
We propose an enhanced Part-of-Speech (POS) tagger of Semitic languages that treats Modern Standard Arabic (henceforth Arabic) and Modern Hebrew (henceforth Hebrew) using the same probabilistic model and architectural setting. We start out by porting an existing Hidden Markov Model POS tagger for Hebrew to Arabic by exchanging a morphological analyzer for Hebrew with Buckwalter's (2002) morphol...
متن کاملOnline Acquisition of Japanese Unknown Morphemes using Morphological Constraints
We propose a novel lexicon acquirer that works in concert with the morphological analyzer and has the ability to run in online mode. Every time a sentence is analyzed, it detects unknown morphemes, enumerates candidates and selects the best candidates by comparing multiple examples kept in the storage. When a morpheme is unambiguously selected, the lexicon acquirer updates the dictionary of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014